AITopics | pu data

Collaborating Authors

pu data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data

Liu, Jialei, Liao, Jun, Fang, Kuangnan

arXiv.org Machine LearningNov-17-2025

Positive-Unlabeled (PU) learning presents unique challenges due to the lack of explicitly labeled negative samples, particularly in high-stakes domains such as fraud detection and medical diagnosis. To address data scarcity and privacy constraints, we propose a novel transfer learning with model averaging framework that integrates information from heterogeneous data sources - including fully binary labeled, semi-supervised, and PU data sets - without direct data sharing. For each source domain type, a tailored logistic regression model is conducted, and knowledge is transferred to the PU target domain through model averaging. Optimal weights for combining source models are determined via a cross-validation criterion that minimizes the Kullback-Leibler divergence. We establish theoretical guarantees for weight optimality and convergence, covering both misspecified and correctly specified target models, with further extensions to high-dimensional settings using sparsity-penalized estimators. Extensive simulations and real-world credit risk data analyses demonstrate that our method outperforms other comparative methods in terms of predictive accuracy and robustness, especially under limited labeled data and heterogeneous environments.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

2511.10919

Country:

Asia > China (0.68)
North America > United States (0.46)

Genre: Research Report > New Finding (0.88)

Industry:

Banking & Finance (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Class prior estimation for positive-unlabeled learning when label shift occurs

Mielniczuk, Jan, Rejchel, Wojciech, Teisseyre, Paweł

arXiv.org Machine LearningFeb-28-2025

We study estimation of class prior for unlabeled target samples which is possibly different from that of source population. It is assumed that for the source data only samples from positive class and from the whole population are available (PU learning scenario). We introduce a novel direct estimator of class prior which avoids estimation of posterior probabilities and has a simple geometric interpretation. It is based on a distribution matching technique together with kernel embedding and is obtained as an explicit solution to an optimisation task. We establish its asymptotic consistency as well as a non-asymptotic bound on its deviation from the unknown prior, which is calculable in practice. We study finite sample behaviour for synthetic and real data and show that the proposal, together with a suitably modified version for large values of source prior, works on par or better than its competitors.

dataset, estimation, estimator, (13 more...)

arXiv.org Machine Learning

2502.21194

Country:

Europe > Poland > Masovia Province > Warsaw (0.04)
Europe > Poland > Kuyavian-Pomeranian Province > Toruń (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Reviews: Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning

Neural Information Processing SystemsFeb-11-2025, 20:15:02 GMT

The basic problem studied in the paper concerns learning from data which is only partially labelled, but nonetheless doing better than with fully labelled data. In the particular scenario where one has a pool of unlabelled data in lieu of one of the classes, the paper seeks to quantify the impact this has on the estimation error of the learned classifier. Determining the degradation or lack thereof when learning from unlabelled data is interesting, and thus the paper seems well motivated. The machinery used to illustrate its messages are fairly standard -- the key quantities, namely the estimation errors for each scenario, are derived from a simple Rademacher analysis -- however, the final results appear novel, with implications worked through in various scenarios. The papers hinges on the simple facts that (a) different risks (for PN/PU/NU learning) may be seen as employing different weightings on individual risks for the positive and negative class, and (b) these ratings are reflected in appropriate terms for the estimation error.

learning, scenario, unlabelled data, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Positive and Unlabeled Data: Model, Estimation, Inference, and Classification

Liu, Siyan, Yeh, Chi-Kuang, Zhang, Xin, Tian, Qinglong, Li, Pengfei

arXiv.org Machine LearningJul-12-2024

This study introduces a new approach to addressing positive and unlabeled (PU) data through the double exponential tilting model (DETM). Traditional methods often fall short because they only apply to selected completely at random (SCAR) PU data, where the labeled positive and unlabeled positive data are assumed to be from the same distribution. In contrast, our DETM's dual structure effectively accommodates the more complex and underexplored selected at random PU data, where the labeled and unlabeled positive data can be from different distributions. We rigorously establish the theoretical foundations of DETM, including identifiability, parameter estimation, and asymptotic properties. Additionally, we move forward to statistical inference by developing a goodness-of-fit test for the SCAR condition and constructing confidence intervals for the proportion of positive instances in the target domain. We leverage an approximated Bayes classifier for classification tasks, demonstrating DETM's robust performance in prediction. Through theoretical insights and practical applications, this study highlights DETM as a comprehensive framework for addressing the challenges of PU data.

assumption, detm, pu data, (17 more...)

arXiv.org Machine Learning

2407.09735

Country: Asia > China (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Add feedback

Meta-learning for Positive-unlabeled Classification

Kumagai, Atsutoshi, Iwata, Tomoharu, Fujiwara, Yasuhiro

arXiv.org Machine LearningJun-5-2024

We propose a meta-learning method for positive and unlabeled (PU) classification, which improves the performance of binary classifiers obtained from only PU data in unseen target tasks. PU learning is an important problem since PU data naturally arise in real-world applications such as outlier detection and information retrieval. Existing PU learning methods require many PU data, but sufficient data are often unavailable in practice. The proposed method minimizes the test classification risk after the model is adapted to PU data by using related tasks that consist of positive, negative, and unlabeled data. We formulate the adaptation as an estimation problem of the Bayes optimal classifier, which is an optimal classifier to minimize the classification risk. The proposed method embeds each instance into a task-specific space using neural networks. With the embedded PU data, the Bayes optimal classifier is estimated through density-ratio estimation of PU densities, whose solution is obtained as a closed-form solution. The closed-form solution enables us to efficiently and effectively minimize the test classification risk. We empirically show that the proposed method outperforms existing methods with one synthetic and three real-world datasets.

classifier, estimation, neural network, (15 more...)

arXiv.org Machine Learning

2406.0368

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Boosting Algorithm for Positive-Unlabeled Learning

Zhao, Yawen, Zhang, Mingzhe, Zhang, Chenhao, Chen, Weitong, Ye, Nan, Xu, Miao

arXiv.org Artificial IntelligenceDec-7-2022

Positive-unlabeled (PU) learning deals with binary classification problems when only positive (P) and unlabeled (U) data are available. Many recent PU methods are based on neural networks, but little has been done to develop boosting algorithms for PU learning, despite boosting algorithms' strong performance on many fully supervised classification problems. In this paper, we propose a novel boosting algorithm, AdaPU, for PU learning. Similarly to AdaBoost, AdaPU aims to optimize an empirical exponential loss, but the loss is based on the PU data, rather than on positive-negative (PN) data. As in AdaBoost, we learn a weighted combination of weak classifiers by learning one weak classifier and its weight at a time. However, AdaPU requires a very different algorithm for learning the weak classifiers and determining their weights. This is because AdaPU learns a weak classifier and its weight using a weighted positive-negative (PN) dataset with some negative data weights $-$ the dataset is derived from the original PU data, and the data weights are determined by the current weighted classifier combination, but some data weights are negative. Our experiments showed that AdaPU outperforms neural networks on several benchmark PU datasets, including a large-scale challenging cyber security dataset.

artificial intelligence, machine learning, unlabeled example, (18 more...)

arXiv.org Artificial Intelligence

2205.09485

Country:

Oceania > Australia > Queensland (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning From Positive and Unlabeled Data: A Survey

Bekker, Jessa, Davis, Jesse

arXiv.org Machine LearningNov-12-2018

Learning from positive and unlabeled data or PU learning is the setting where a learner only has access to positive examples and unlabeled data. The assumption is that the unlabeled data can contain both positive and negative examples. This setting has attracted increasing interest within the machine learning literature as this type of data naturally arises in applications such as medical diagnosis and knowledge base completion. This article provides a survey of the current state of the art in PU learning. It proposes seven key research questions that commonly arise in this field and provides a broad overview of how the field has tried to address them.

artificial intelligence, assumption, machine learning, (17 more...)

arXiv.org Machine Learning

1811.0482

Country: Europe (0.46)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Health & Medicine > Health Care Technology (0.46)
Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data

Bekker, Jessa, Davis, Jesse

arXiv.org Machine LearningSep-10-2018

Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can be enabled in this setting. We propose and theoretically analyze an empirical-risk-based method for incorporating the labeling mechanism. Additionally, we investigate under which assumptions learning is possible when the labeling mechanism is not fully understood and propose a practical method to enable this. Our empirical analysis supports the theoretical results and shows that taking into account the possibility of a selection bias, even when the labeling mechanism is unknown, improves the trained classifiers.

artificial intelligence, machine learning, propensity score, (15 more...)

arXiv.org Machine Learning

1809.03207

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.71)

Add feedback

Instance-Dependent PU Learning by Bayesian Optimal Relabeling

He, Fengxiang, Liu, Tongliang, Webb, Geoffrey I, Tao, Dacheng

arXiv.org Machine LearningAug-6-2018

When learning from positive and unlabelled data, it is a strong assumption that the positive observations are randomly sampled from the distribution of $X$ conditional on $Y = 1$, where X stands for the feature and Y the label. Most existing algorithms are optimally designed under the assumption. However, for many real-world applications, the observed positive examples are dependent on the conditional probability $P(Y = 1|X)$ and should be sampled biasedly. In this paper, we assume that a positive example with a higher $P(Y = 1|X)$ is more likely to be labelled and propose a probabilistic-gap based PU learning algorithms. Specifically, by treating the unlabelled data as noisy negative examples, we could automatically label a group positive and negative examples whose labels are identical to the ones assigned by a Bayesian optimal classifier with a consistency guarantee. The relabelled examples have a biased domain, which is remedied by the kernel mean matching technique. The proposed algorithm is model-free and thus do not have any parameters to tune. Experimental results demonstrate that our method works well on both generated and real-world datasets.

artificial intelligence, classifier, machine learning, (19 more...)

arXiv.org Machine Learning

1808.0218

Country:

Oceania > Australia (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Information-Theoretic Representation Learning for Positive-Unlabeled Classification

Sakai, Tomoya, Niu, Gang, Sugiyama, Masashi

arXiv.org Machine LearningFeb-12-2018

In real-world applications, it is conceivable that only positive and unlabeled (PU) data are available for training a classifier. For instance, in land-cover image classification, images of urban regions can be easily labeled, while images of non-urban regions are difficult to annotate due to high diversity of non-urban regions containing, e.g., forest, seas, grasses, and soil (Li et al., 2011). To cope with such situations, PU classification has been actively studied (Letouzey et al., 2000; Elkan and Noto, 2008; du Plessis et al., 2015), and the state-of-the-art method allows us to systematically train deep neural networks only from PU data (Kiryo et al., 2017). However, existing PU classification methods typically require an estimate of the class-prior probability, and their performance is sensitive to the quality of class-prior estimation (Kiryo et al., 2017). Although various class-prior estimation methods from PU data have been proposed so far (du Plessis and Sugiyama, 2014; Ramaswamy et al., 2016; Jain et al., 2016; du Plessis et al., 2017; Northcutt et al., 2017), accurate estimation of the class-prior is still highly challenging particularly for high-dimensional data.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Machine Learning

1710.05359

Country:

Europe (0.46)
North America > United States (0.28)
Asia > Japan (0.28)

Genre: Research Report (1.00)

Add feedback